Graph Clustering for Keyword Search

نویسندگان

  • Rose Catherine
  • S. Sudarshan
چکیده

Keyword search on data represented as graphs, is receiving lot of attention in recent years. Initial versions of keyword search systems assumed that the graph is memory resident. However, there are applications where the graph can be much larger than the available memory. This led to the development of search algorithms which search on a smaller memory resident summary graph (supernode graph), and fetch parts of the original graph from the disk, only when required. In this scenario, good clustering of nodes into supernodes, when constructing the summary graph, is a key to efficient search. In this paper, we address the issue of graph clustering for keyword search, using a technique based on random walks. We propose an algorithm, which we call Modified Nibble clustering algorithm, that improves upon the Nibble algorithm proposed earlier. We outline several policies that can improve its performance. Then, we compare our algorithm with two graph clustering algorithms proposed earlier, EBFS and kMetis. Our performance metrics include edge compression, keyword search performance, and the time and space overheads for clustering. Our results show that Modified Nibble outperforms EBFS uniformly, and outperforms kMetis in some settings. Further, the memory requirements of our algorithm are much lower than that of kMetis, making it practical even with a very large number of nodes, unlike kMetis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Effective Path-aware Approach for Keyword Search over Data Graphs

Abstract—Keyword Search is known as a user-friendly alternative for structured languages to retrieve information from graph-structured data. Efficient retrieving of relevant answers to a keyword query and effective ranking of these answers according to their relevance are two main challenges in the keyword search over graph-structured data. In this paper, a novel scoring function is proposed, w...

متن کامل

Exploiting Semantic Result Clustering to Support Keyword Search on Linked Data

Keyword search is by far the most popular technique for searching linked data on the web. The simplicity of keyword search on data graphs comes with at least two drawbacks: difficulty in identifying results relevant to the user intent among an overwhelming number of candidates and performance scalability problems. In this paper, we claim that result ranking and top-k processing which adapt sche...

متن کامل

A partition-based algorithm for clustering large-scale software systems

Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...

متن کامل

Finding Community Base on Web Graph Clustering

Search Pointers organize the main part of the application on the Internet. However, because of Information management hardware, high volume of data and word similarities in different fields the most answers to the user s’ questions aren`t correct. So the web graph clustering and cluster placement in corresponding answers helps user to achieve his or her intended results. Community (web communit...

متن کامل

Keyword Generation for Lyrics

This paper proposes a scheme for content based keyword generation of song lyrics. Syntactic as well semantic similarity is used for sentence level clustering to separate the topic from the background of a song. A method is proposed to search for a center in the semantic graph ofWordNet for generating keywords not contained in original text.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009